摘要 :
In many time-sensitive applications, knowing the classification results as early as possible while preserving the accuracy is extremely important for further actions. Shapelet-based early classification methods are popular due to ...
展开
In many time-sensitive applications, knowing the classification results as early as possible while preserving the accuracy is extremely important for further actions. Shapelet-based early classification methods are popular due to their natural interpretability. However, most of the existing shapelet-based methods ignore the distance information between the shapelets and the time series. The distance information, though may contain some noise, can reflect more information between the shapelets and the time series. Some existing works adopt the distance information, but are not robust to the noise in the distance information. To tackle this challenge, we present a novel distance transformation based early classification (DTEC) framework, which transfers the original time series into the distance space. Upon the distance space, a probabilistic classifier is trained, and a novel classification criterion confidence area is proposed in order to overcome the noise brought by the training phase and the dataset. The effectiveness of the proposed framework is validated on three time series benchmarks as well as the extensive datasets selected from UCR time series archive.
收起
摘要 :
Recent studies show that pre-trained language models (LMs) are vulnerable to textual adversarial attacks. However, existing attack methods either suffer from low attack success rates or fail to search efficiently in the exponentia...
展开
Recent studies show that pre-trained language models (LMs) are vulnerable to textual adversarial attacks. However, existing attack methods either suffer from low attack success rates or fail to search efficiently in the exponentially large perturbation space. We propose an efficient and effective framework SemAttack to generate natural adversarial text by constructing different semantic perturbation functions. In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e.g., WordNet), contextualized semantic space (e.g., the embedding space of BERT clusterings), or the combination of these spaces. Thus, the generated adversarial texts are more semantically close to the original inputs. Extensive experiments reveal that state-of-the-art (SOTA) large-scale LMs (e.g., DeBERTa-v2) and defense strategies (e.g., FreeLB) are still vulnerable to SemAttack. We further demonstrate that SemAttack is general and able to generate natural adversarial texts for different languages (e.g., English and Chinese) with high attack success rates. Human evaluations also confirm that our generated adversarial texts are natural and barely affect human performance.
收起
摘要 :
In this paper we review the current landscape of data-driven decision making in the context of operating residential and commercial building systems with energy management objectives. First, we present results from a literature re...
展开
In this paper we review the current landscape of data-driven decision making in the context of operating residential and commercial building systems with energy management objectives. First, we present results from a literature review focused on identifying new sources of data that have become available (e.g., smart-phone sensors, utility smart meters) and their potential to impact the decision making processes involved in operating these facilities. Existing obstacles to realizing the full potential of these novel data sources are discussed and later explored more in depth through case studies. These include limited interoperability and standardization practices, high labor and/or maintenance costs for installing and maintaining the instrumentation and computationally expensive inference procedures for extracting useful information out of the measurements. Finally, two specific research projects that address some of these challenges are presented in detail: one on disaggregating the total electricity consumption of a building into its constituent loads for informing predictive maintenance practices; and another on standardizing meta-data about sensors and actuators in existing Building Automation Systems (BAS) so that software applications targeting building systems can be deployed in different buildings without the need for manual configuration. Our case studies reveal that the rapid proliferation of sensing/control devices, alone, will not improve the building systems being monitored or significantly alter the way these systems are managed or controlled. When data about the physical world is a commodity, it is the ability to extract actionable information from this resource what generates value and, more often than not, this process requires significant domain expertise.
收起
摘要 :
Robust (fuzzy) extractors are very useful for, e.g., authenticated key exchange from a shared weak secret and remote biomet-ric authentication against active adversaries. They enable two parties to extract the same uniform randomn...
展开
Robust (fuzzy) extractors are very useful for, e.g., authenticated key exchange from a shared weak secret and remote biomet-ric authentication against active adversaries. They enable two parties to extract the same uniform randomness with a "helper" string. More importantly, they have an authentication mechanism built in that tampering of the "helper" string will be detected. Unfortunately, as shown by Dodis and Wichs, in the information-theoretic setting, a robust extractor for an (n, k)-source requires k > n/2, which is in sharp contrast with randomness extractors which only require k = ω(log n). Existing works either rely on random oracles or introduce CRS and work only for CRS-independent sources (even in the computational setting). In this work, we give a systematic study about robust (fuzzy) extractors for general CRS dependent sources. We show in the information-theoretic setting, the same entropy lower bound holds even in the CRS model; we then show we can have robust extractors in the computational setting for general CRS-dependent source that is only with minimal entropy. We further extend our construction to robust fuzzy extractors. Along the way, we propose a new primitive called κ-MAC, which is unforgeable with a weak key and hides all partial information about the key (both against auxiliary input); it may be of independent interests.
收起
摘要 :
With the development of biomedical language understanding benchmarks, Artificial Intelligence applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replic...
展开
With the development of biomedical language understanding benchmarks, Artificial Intelligence applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform far worse than the human ceiling.
收起
摘要 :
Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have ...
展开
Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have been proposed to finely tune the sensitivity of different program parts. However, these approaches are usually designed for specific program analyses, and their abstraction adjustments are coarsegrained as they directly drop sensitivity elements. In this paper, we propose a new technique, 4DM, to tune abstractions for program analyses in Datalog. 4DM merges values in a domain, allowing fine-grained sensitivity tuning. 4DM uses a data-driven algorithm for automatically learning a merging strategy for a library from a training set of programs. Unlike existing approaches that rely on the properties of a certain analysis, our learning algorithm works for a wide range of Datalog analyses. We have evaluated our approach on a points-to analysis and a liveness analysis, on the DaCapo benchmark suite. Our evaluation results suggest that our technique achieves a significant speedup and negligible precision loss, reaching a good balance.
收起
摘要 :
With the advancement of the trusted execution environment (TEE) technologies, hardware-supported secure computing becomes increasingly popular due to its efficiency. During the protocol execution, typically, the players need to co...
展开
With the advancement of the trusted execution environment (TEE) technologies, hardware-supported secure computing becomes increasingly popular due to its efficiency. During the protocol execution, typically, the players need to contact a third-party server for remote attestation, ensuring the validity of the involved trusted hardware component, such as Intel SGX, as well as the integrity of the computation result. When the hardware manufacturer is not fully trusted, sensitive information may be leaked to the third-party server through backdoors, steganography, and kleptography, etc. In this work, we introduce a new security notion called semi-trusted hardware model, where the adversary is allowed to passively or maliciously corrupt the hardware. Therefore, she can learn the input of the hardware component and might also tamper its output. We then show how to utilize such semi-trusted hardwares for correlated randomness teleportation. When the semi-trusted hardware is instantiated by Intel SGX, to generate 10k random OT's, our protocol is 24X and 450X faster than the EMP-IKNP-ROT in the LAN and WAN setting, respectively. When SGX is used to teleport Garbled circuits, the resulting two-party computation protocol is 5.3-5.7X and 43-47X faster than the EMP-SH2PC in the LAN and WAN setting, respectively, for the AES-128, SHA-256, and SHA-512 evaluation. We also show how to achieve malicious security with little overhead.
收起
摘要 :
A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high ...
展开
A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.
收起